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1 .  Introduction 

In  earlier  papers  [1]~(3],  various  aspects  of  analysis  of  possibly  lncom- 
piece  random  samples  have  been  discussed.  These  analyses  all  apply  to  data 
from  a  single  random  sample  only.  The  present  paper  describes  some  extensions 
of  these  methods  when  sets  of  samples  are  available. 

When  more  than  one  sample  is  available,  the  field  of  hypotheses,  alter¬ 
nate  to  that  of  having  complete  camples,  becomes  much  richer.  Some  of  the 
more  interesting  possible  situations  are  discussed,  though  no  exhaustive  general 
theory  is  developed. 

A  secondary  aim  of  this  paper  la  to  lay  foundations  for  later  extension 
of  the  methods  to  cases  when  the  analytical  forms  of  the  distribution^)  of 
observed  random  varlable(s)  are  not  completely  known.  In  such  cases  it  is 
almost  essential  to  have  a  number  of  samples;  useful  results  can  hardly  be 
expected  from  a  single  sample  (even  if  it  is  quite  large).  Techniques  for 
such  problems  are  not  developed  in  the  present  paper,  but  knowledge  of  methods 
appropriate  when  population  distribution  is  known  is  an  essential  preliminary 
to  development  of  such  techniques. 

2.  Notation  end  Preliminary  Formulae 

As  in  the  earlier  papers,  it  will  be  supposed  that  observed  values  of 
independent  continuous  random  variables  with  a  common  (population)  density 
function  f(t)  are  being  used.  The  i-th  sample  (i-1,2, . . . ,u)  Comprises 
xl  ordered  values  "SXir  '  Such  censorin8  88  h/5v<*  occurred  is 


2 


supposed  limited  to  censoring  of  extreme  values,  In  which  the  s^q  least  and 

a^^  greatest  values  of  an  original,  complete  sample  of  size  n^  -  r^  +  s^0 

+  s.  have  been  omitted,  leaving  the  r.  observed  values, 
iri  1 


The  (ordered)  probability  integral  transforms 


f13 

'« ■  L 


f(t)dt 


have  the  joint  density  function 


«  <r1+sio+sirl)J  s1Q 


1-1  Jio1  ‘ir^ 


*u°  (1-yir1>  1 


The  Joint  density  of  the  ■  least  values  Y^,  Y21’***’Yml  fln<^  c^e  ■ 

greatest  values  Ylt  ,Y2j,  , . . .  .Y^  is 
12  m 

•  l'(tl4»10t*lr>1  ...  r.-2  *lr7 

<i>  rr - i —  <»lr  -»u) 1  u-»lr  >  1 

i-X  “1g!  Kr  -2)1  11  lrl  11  IrJ 


«»ruxylr‘i) 


The  symbols  1>(x)  will  denote  the  digamme  function  of  argument  x, 

*<*)  -  ~  Clog  r<x))(-  ~  log(x-l)l) 

Successive  further  derivatives  p^Cx),  p^(x),...  ere  the  trigamma, 
tetrsgassw  ...  functions. 


In  £ij ,  problems  of  estimation  of  total  size  of  a  random  sample,  given 
the  r  laast  (or  greatest)  values  observed  in  the  sample,  were  discussed. 
Bare  these  results  are  extended  to  the  case  when  it  Is  known  that  a  samples 


3 


all  have  the  same  original  size,  n,  but  only  the  least  ri»r2* ' ' * »rffl  va^ues 
are  recorded  in  the  first,  second,...  m-th  samples  respectively.  In  the 


notation  of  Section  2,  this  means  that  -  0;  sir  ■  n-r^. 
From  the  Joint  likelihood  function  of  the  ordered  X's 


(4)  £Q(|n)  —  ]|  *  *  *  *  ^mr  1°^  "  I  F 


n-r.  ri 


L!L^?(1-V  ‘Uf<v' 


we  seek  to  obtain  a  maximum  likelihood  estimator  of  n. 

Regarding  n  as  continuously  variable,  we  obtain  the  equation 

(5)  m  *<fcU)  -  l  *<ft-r.+l)  -  log[  TT  U-Y.  >1 
i-1  i-1  i 

for  the  maximum  likelihood  estimator  ft.  An  approximate  value  of  n  can  be 
obtained  by  making 

*<x|fi  +  |>  -  -  f> 


which  gives 


ft  [1  -  rt(ft  +  j)"1]  v  TT  (l-»lr  > 

i-1  i- 1  i 


Provided  no  Y.  equals  1  (which  has  probability  zero)  equation  (6)  has  a 
1  1 

unique  root  greater  than  max(r^, . . . , r^)  -  y ,  The  appropriate  integer  value 
for  fl  is  that  between  (ft  -  j)  and  (&  +  ■£)•  (If  these  are  integers,  either 
can  be  used.) 

If  r,  -  r,  -  ...  -  r  ,  then  (5)  becomes 

>12  D 


+  1)  -  *(n  -  r  +  1)  -  m  1  log[  TT  (1  “  ?lr)l 

J-l 

ri  <fi  -  j)'1-  nf1  log  ITT  a-  r>i. 

J-O  J-l 


A 


In  this  case,  (6)  becomes 


»  *  r[l  -  TT  (1  -  I.)1'")-1  -  i  . 

i-1  lr  L 


which,  for  m  -  1,  gives 


a  9  r  Y 


-I  1 
lr  “  2  ‘ 


The  Cramer-Rao  lower  bound  for  the  variance  of  an  unbiased  estimator  of 


n  is 


(7)  [  I  ^(n-r.+l)  -  m*(1)(n +1) ]-1 

i-1  1 

For  r^  •  rj  ■  "  r  ■  r,  this  is 

(7)’  a-1[  (n-i+1)  -  -  ■“1C*J1(b-J)"21“1 

j-0 

Unfortunately,  if  (7)  (or  (7)')  is  used  to  approximate  var(h),  it  gives 
(at  leaGt  for  m-1)  unduly  optimistic  (i.e.  small)  values.  We  have  (since 
*ir  **as  *  beta  distribution  vith  parameters  ri>  n-r^+1) 

(8.1)  E[Y^  )  -  n(rrl)’1 

X 


>  var(Yiii)  "  n(n-ri-H)(r1-l)"2(rt-2)’1 

From  (6)"  we  see  chat,  for  m-1 


(9.1) 


Efft)  t  r(r-l)"1n  -  j 


var(ft)  *  r2(r-l)"2(r-2)"1  n(n-r!-l) 


(9.2) 


(Note 


From  (9. 

1)  we  see 

that  there  is 

a  bias  of  about  (r- 

1)  1  n  -  y  .  (Note 

that  the 

true 

value  of 

E[n]  cannot 

differ  from  (9.1)  by 

more  than  1). 

Table  1 

contains  approximate  values  of  the  variance 

and  mean  square 

error  of 

t 

as  given  by  (6)",  and  also  values  of  the  Craradr-Rao  lower  bound 

(from  (7)  with  m-1). 

Table 

1:  Approximate  Variance  and  Mean  Square  Error 

of  n,  and 

Cramer- Rao  Lower  Bounds 

m  ■ 

1 

(Cramer-Rao  Lower 

Efficiency  (Z)  of 

(Approximate) 

Bound)  X  m 

N(m”l, . . . ,m“l) 

r 

n 

Var(ft) 

M.S.E. (A) 

4 

4 

3.56 

4.25 

0.7024 

33 

6 

16.00 

18.25 

4.1427 

46 

8 

35.56 

40.25 

9.6329 

48 

10 

62.22 

70.25 

17.1295 

49 

12 

96.00 

108.25 

26.6279 

50 

15 

160.00 

180.25 

44.6267 

50 

6 

6 

2.16 

2.65 

0.6705 

45 

8 

8.64 

9.85 

3.6046 

60 

10 

18.00 

20.25 

7.9267 

63 

12 

30.24 

33.85 

13.5892 

65 

15 

54.00 

60.25 

24.5866 

65 

8 

8 

1.74 

2.15 

0.6547 

49 

10 

6.67 

7.53 

3.3359 

67 

12 

13.06 

15.53 

7.0739 

71 

15 

26.12 

28.82 

14.5893 

73 

10 

10 

1.54 

1.91 

0.6453 

52 

12 

5.5 

6.25 

3.1748 

71 

15 

13.89 

15.25 

8.5595 

76 

12 

B 

0.6390 

53 

(9 

4.5594 

76 

15 

15 

1.32 

1.65 

0.6327 

55 

la 

view 

of  the  above  results  it 

seems  worthwhile  to 

seek  some  alternative 

estimator  for  n 
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From  (8.1),  (r,-l)Y.  is  an  unbaised  estimator  of  n  with  variance 

i  tr± 

-1  ® 

nfa-r^l)  (^-2)  .  So  if  a^  -  1 


(10)  N(a, . a  )  -  l  a  (r.-l)Y“; 

X  “  1  *  lrj 

is  an  unbiased  estimator  of  n.  The  variance  of  N{*)  is  minimized  by  talcing 
a^  proportional  to  (r^-2) (n-r^+1)  As  n  is  not  known,  it  is  not  possible 
to  calculate  this  value  of  a^.  For  a  first  approximation  it  is  reasonable  to 
take  a^  proportional  to  r^-2,  or  even  Just  to  take  "  *2  "  ***  “  dm  "  ° 

(which  is,  of  course,  optimal  if  r .  »  r2  ”  *  * '  “  rm)* 

Table  2  gives  some  numerical  comparisons  between 


where 


var(N(a. . a  ))  -  n2{  I  (r.-2)]  1-a  f  (r,-l) (r,-2) [  £  (r.-2)] 

1  ®  i-1  1  i-1  i-1 

*i "  (rr2)t  !  <v2>r1 

i-1 


-1  -1-2  — 
var(N(m  ...,m  x))  -  nm  J  (n-r .+1) (r.-2) 

1-1 

-n irdU  ?  (r  -2)-1  -  £ 

2  //  i  '  “ 

m  i-1 


(12) 
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Table  2:  Variances  of  (a)  N(a^,...,am) 

(b)  N(m*^ . m'b 


rl 

r2 

r3 

r4 

(a) 

(b) 

5 

4 

— 

— 

0.2n2-0.72n 

0.2083n2-G.70S3n 

6 

4 

- 

- 

0.16n2-0.72n 

0.1875n2-0.6875n 

7 

6 

- 

- 

0. ln2-0. 6173a 

0.1125n2-0. 6125a 

8 

6 

- 

- 

0.1a2-0.62n 

0. 10416n2-0. 60416n 

9 

8 

- 

- 

0. 0769n2-0. 5799n 

0.0774a2-0. 5774a 

10 

8 

- 

- 

0.0714n2-0. 5816n 

0.0729n2~0.5729n 

7 

6 

6 

- 

- 

0..\25n  - 

0. 625o 

8 

8 

- 

- 

0.033n2  - 

0.583n 

10 

10 

- 

- 

0.0625n2  -  0 

i.5625n 

5 

5 

4 

- 

0.125n2-0.46875n 

0.1296n2-0.4630n 

5 

4 

4 

- 

0 . 1 428n2-0 . 4 89  3n 

0.1482o2-0.4815n 

7 

7 

6 

- 

0.0714n2-0. 4082a 

0.0722n2-0.4056n 

7 

6 

6 

- 

0.0769n2-0.4142n 

0.0778n2-0.4111n 

10 

10 

8 

- 

0.0455n2-0.3843n 

0.0463n2-0.3796n 

10 

8 

8 

- 

0.05G0o2-0. 3900a 

0.0509n2-0. 3843a 

10 

8 

6 

- 

0.0556a2-0.4136n 

0.0602n2-0. 3935a 

6 

6 

6 

- 

O.CS33a*  - 

0. 4167a 

8 

8 

8 

- 

0.0556n2  - 

0. 3839a 

10 

10 

10 

- 

O.C417a2  - 

0. 3750a 

5 

5 

4 

4 

0 . 0909n2-0 . 34  71n 

0.09375n2-0.34375n 

5 

4 

4 

4 

0.0llln2-0. 3703c 

0.1146a2-0.3646n 

7 

7 

7 

6 

0.0526n2-0.3047n 

0. 056 ln2-0. 3031a 

7 

6 

6 

6 

0.0588n2-0.31I4n 

0.0594n2-0.3094n 

10 

10 

10 

8 

0.0333a2-0. 2857n 

0.0339n2-0.-839n 

10 

8 

8 

8 

0.0385n2-0.2929n 

0.0391n2-0.2891n 

10 

10 

8 

6 

0.03S5a2~G. 3047n 

0. 0 37 5a2-0. 2875a 

10 

8 

8 

6 

0. 0417n2-0. 3056n 

0. 0443n2-0. 2943a 

10 

6 

8 

6 

6 

.0.0455n2-0._3182n  p_. 

0.0495n2-0. 2995n  . 

6 

6 

6 

0.0625n2  - 

0. 3125n 

8 

8 

8 

8 

0.0417n2  - 

0.2917a 

10 

10 

10 

10 

0.03125n2  - 

0.2812So 

8 


£c  can  be  seen  that  little  is  lost  by  using  N(m  ^.....m  *),  at  any  rate 
for  the  amount  of  variation  in  values  of  r  shown  in  the  table.  The  last 
column  of  Table  1  gives  the  efficiency  of  N(m  m  relative  to  the 


Cram£r~Rao  lower  bound,  in  cases  when  r. 


r-  -  ...  “  r  »  r. 
2  nt 


We  note  that  in  the  case  of  symmetrical  censoring  with  r^  •  r^  •  •• 

r  •  r,  n  «  r+2s  (s„  ■  s  «  s),  the  maximum  likelihood  estimator  of  n 
n  Or 


satisfies  the  equation 


n 

*(7<fi-r)+l)  -  *0*1)  -  7  »-1  I  logIYu(l-Tlr)J 


The  statistic 
-1 


i-1 


ir 


m'\r-2)  l  (Ylr-Y  ) 
j«l 


-1 


is  an  unbiased  estimator  of  n.  It  has  variance 


nm  ^(r-3)  ^(n-r+2). 


3.  Tests  of  Sample  Size 

If  we  wish  to  teat  the  hypothesis  that  the  available  data  represent  the 
whole  of  the  original  samples,  and  still  to  confine  ourselves  to  situations 
where  the  original  sample  sizes  are  all  the  same  (n^n^". • .”nB*n) ,  then  we 
need  consider  only  cases  for  which  r. “r0™. . .**r  .  For  if  some  r'a  are 
smaller  than  others  then  (under  the  condition  •  nj  *  •••  •  «  n)  the 

corresponding  samples  must  be  incomplete  and  there  is  no  need  for  a  test. 

It  is  shown  in  [2]  that,  for  a  single  sample,  s  test  with  critical  region 
of  form 

Y1  <1'Yr)  >  Ca 

is  uniformly  most  powerful  with  respect  to  all  alternatives  to  the  hypothesis 

a  >s  -  0.  for  which  s -/a  -  6.  If  the  number  of  available  observations 

Or  or 

is  the  same  for  all  samples  (r^*,r2"««*_r,r“t)  and  the  complete  sample  size 
(n  -  r+s0+*r)  is  also  the  seme  then 
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>  C 

a 


is  uniformly  most  powerful  with  respect  to  all  alternatives  for  which  a^ar 
As  particular  cases  we  have  (i)  censoring  from  belov,  for  which  e^  ■  0  and 
the  critical  region  is  of  fora 
m 


rr* 

i“l 


il  >  Ca* 


sad  (ii)  eyrmetrioal  censoring,  for  which  Sq 
is  of  form 


sr»  and  tha  critical  region 


e. 


IT  -  V 

Of  course  censoring  fxvm  above  (b^-0)  can  be  treated  by  similar  methods 
to  those  appropriate  to  censoring  from  below. 

The  values  of  Cft  have  to  be  chosen  to  give  the  required  significance 
level,  in  each  case. 

In  the  subsequent  discussion  we  will  consider  a  rather  more  general 
situation  in  which  the  hypothesis  tested  is  that  the  complete  sample  size  is 
ng(&  nax(r^. . . rffi))  against  alternatives  that  it  exceeds  n^.  We  will  however 
usually  restrict  ourselves  to  the  case  rfr2m' *  •*rn“r»  though  this  is  no 
longer  the  only  esse  of  interest.  Tha  hypotheala  of  "completeness"  corresponds 
to  taking  nQ  equal  to  r. 


3.1  Censoring  from  Balow 

From  (2) ,  putting  s^q  ■  n-r  and  a^r  •  0  we  see  that  the  likelihood 
ratio  of  n  ■  n*  against  n  •  n^  is 
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-  constant.  *  (  ft  * 
£(X|n0)  i-1 


So  a  test  with  critical  region 


TT  *u  >  co 


la  uniformly  most  powerful  with  respect  to  the  set  of  alternatives  hypotheses 

n  >  Dq,  given  this  kind  of  censoring.  This  is  so  even  if  the  r^’s  are  not 

all  equal (provided  of  course  Oq  i  max( r^, ....  r.»* 

Each  Y^  has  a  beta  distribution  with  parameters  n-r+1,  r.  The  dia¬ 
ls 

tribution  of  ^  Y^  is  complicated,  but  a  useful  approximation  may  be 

constructed  by  considering  the  distribution  of  C  «  -2  logd  I  Y^)  - 

-2  l  log  Y. .  The  cumulant  generating  function  of  -log  Y  is 
i-1  °  1 

-f  log  Y  __ 

(14)  loga  E[e  *  l)  -  loge  E(YtT) 

- 

-  log  r(n-rfl-t)  -  log  r(n+l-T> 

-loser  (n-r-i-1)  +  loger(a^l) . 

Hence  the  s-th  cuaulant  of  -log#  Y,^  is 


1(-logeY1)  -  (-l)*t*C9"1>(n-i+l)  - 


r-1 

(s-l)I  l 
J-0 


r-1  . 

So  -2  log^  ie  distributed  as  J  (n-j)  where  W^q,.  •  '»Wi>Xwi 


independent  variables,  snd 
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(16) 


G  -  -2  £  log  Y  is  distributed  as 

j  _  t  Cl 


i-1 


I  T~l  (n-j)"1  W  -  ^  (n-J)"1  W 
i-1  j-0  13  j-0  3 


where  are  independent  X2m  variables. 

So,  to  test  the  hypothesis  n  -  (against  alternatives  n  >  n^)  we 
use  the  critical  region 

*2  X  *  «. 

where 


r-l 


(17) 


l  VJ)"1  V  <  C  ]  - 
j-0  °  3  a 


(Note  that  it  is  the  lower  tall  of  the  G-distrifcution  which  gives  significance.) 

It  is  possible  to  give  explicit  formulae  for  the  probability  in  (17)  (see 

2 

Appendix  1).  since  each  la  Ustributed  as  a  x  with  an  even  number  of 

degrees  of  freedom,  but  except  for  unrealistically  small  values  of  r  and  m, 

these  would  not  be  useful  for  purposes  of  calculation.  Useful  approximations 

t-1  _i 

(at  least  for  «£2)  can  be  achieved  by  regarding  (nQ-j)  Wj  03 

2 

approximately  equivalent  to  cxv  ,  with  c  and  v  chosen  to  give  the  correct 
first  and  second  moments,  i.e. 


(ia.i) 

r-l  ,  r-1  .  . 

c  -  [  l  (Vi)"2]!  l  (nQ-j)  j”"1 

j-0  0  j-0  0 

(18.2) 

v  -  2m[  l  (nQ-j)  Yl  I  (n0-j)"Z] 
j-0  j-0 

Approximate  values  of  the  power  can  be  obtained  by  replacing 
For  m  -  1,  exact  values  are  easily  calculated,  as  shown  in  [3]. 
mat ion  would  be  expected  to  improve  as  m  increases  (In  that  the 


nQ  by  n. 
The  approx t- 
Wj ' a ,  and 
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also  Che  approximation,  both  become  more  nearly  normal).  Setter  approximation 
would  also  be  expected,  for  given  r  and  m,  as  n  increases,  because  the 
coefficients  (n-j)  *  are  in  ratios  closer  to  1.  Investigations  summarized 
in  Appendix  II  confirm  these  expectations 


3.2  Symmetrical  Censoring 

The  first  part  of  discussion  follows  exactly  similar  lines  to  that  in 


Section  3. 

1*  and  la  therefore  condensed.  The  critical  region 

a 

(19) 

TT  nutt-ilt)j > ca 

i-1 

with  C 

a 

chosen  so  that 

(20) 

Prt  IT  lYil(1‘Yir>3  >  C«,n  "  n0]  “  a 

gives  a  test  of  the  hypothesis  n  «■  Hq  which  is  uniformly  most  powerful  with 

respect  to  alternatives  n  >  Hq,  given  that  censoring  is  symmetrical.  This 

also  is  true  even  if  the  r^'s  are  not  all  equal,  provided  n^  max(r^,...,rffl). 

From  (3),  with  •  r,  siQ  -  *ir  •  -|<a-r),  we  obtain  the  cumulant 

generating  function  of  -log  [Y  (1-Y.  )]  as 

e  a.x  ir 


(21) 

Hence,  if 

(22) 

Since  -|(n-r) 


2[loger(~  +l-T)-loger(~  +l)]-Uoger(n+l-2x)-logcr(n+l)]. 

a 

G--2  l  log  [Y  (1-Y  )] 

1-1  e  11  lr 

xg(C)  -  m(-l)328[2*(8_;i)(~ +1)  -  2V5~1)(n-*-l)) 
must  be  an  integer 

4(o+*r)"l 

*(8"1>(I~  +1)  -  *(6'1)(irt-l)+(-l)8(s-l)!  I  (n-j)"8 
*  j-0 


and  (22)  can  be  written 
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(22) '  k  (G)  -  n.2S+1((8-l)j‘(nfr)  1<n-j)"8+(-l)6+1(28'1-l)^(8“:l)(n+l)] 

j-0 

Although  we  do  not  have  a  simple  representation,  as  in  section  3.1,  it 

2 

seems  reasonable  to  approximate  the  distribution  of  G  by  that  of  cxv  with 

(23)  c  -  2-  «2(G)I^1(G)f1;  v  -  21^(0) ]2(*2(G) ]_1. 

3.3  General  Purpose  Tests 

If  the  value  of  O^s^/s^)  Is  not  known,  we  do  not  have  a  uniformly  most 
powerful  test  of  sample  size.  In  [2]  a  teat  of  completeness  with  critical  region 

Y.  +  (1-Y  )  >  C 
1  r  a 

with  ICa(2,r-l)  ■  1-a,  has  been  proposed,  for  the  single  sample  case.  This 
test  was  derived  on  heuristic  arguments,  but  ha3  been  shown  [2  ]  to  have  pro¬ 
perties  rendering  it  a  useful  "general  purpose"  test  when  8  is  not  known. 

Put 


Vi  "  Yil  +  (1_Yir)  (i  “  1>2 . n>* 

The  density  function  of  is 

[Btf+a-r.r-Df1  v^^a-v^1"2  (0  <  v±  <  1) 

and  so  fi  have  the  likelihood  ratio 


*(71,.,.,Vclln,) 

*<VX . Vjn0) 


B(2+nQ-r,r-l) 


B(2+n’-r,r-l]_ 


TTv, 

i-i  4 


n--n„ 


So  a  uniformly  most  powerful  test  of  the  hypothesis  n  »  Uq  (if  only  VX,...,VQ 
are  to  be  used)  against  the  set  of  alternatives  n  >  Oq, 
the  critical  region 


is  obtained  by  uclng 
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(24)  TT  V>  >  C 

i-1  1  a 

n 

with  Pr[  j”T  Vi  >  Ca|nQ3  -  o. 

Again  this  is  so  even  when  there  are  different  numbers  ri»r2» *  * ’ *rB 

observations  available  in  the  different  samples,  and  ve  now  give  some  formulae 

appropriate  to  this  more  general  case. 

The  value  of  Co  depends  on  nQ,  a,  , . . . ,rB*  In  order  to  develop  use- 

n 

ful  approximations  we  use  the  criterion  G  -  -2  £  log  V.. 

1-1  e 

The  cuaulant  generating  function  of  -log^  is 


logeE(V~T]  -  log^ 


B(2+n-r1-T,r1-l) 
_B  (2+n-r^ ,  r^-l) 


Bence  the  s-th  cuaulant  of  G  is 


l  2*  (-1)8  [^8-L)(2+n-r.)  -  *U“1)(n+l)]-  2*(s-l)l  j>  f  (n-J) 

i-1  1  1-1  J-0 


(s-1) , 


»  V2 


The  distribution  of  G  is  that  of 
a  ri"’2 

(26)  l  l  (n-j)"1  V 

i-1  j«0  J 

2 

where  the  U's  are  Independent  X£  variables. 
(26) can  also  be  expressed  as 


(26)*  [  (n-j)*1  I(J)W  -  l  (n-j)_lW 

1-0  1  3  j-0  2 

where  &  -  maxtr^r^ . .  .,rn) ;  and  denotes  summation  over  all  i  for 

*  2 

which  r.  1  j+2.  The  W  's  era  independent  x?_  variables,  with  ra  - 

1  J  J 

number  of  r^'a  greatar  than  or  equal  to  (j+2). 

If  r,-r,-...-r  -r,  then  (26)'  becomes 

1  a  m 

r-2  . 

I  (n-J)  \ 

J-0  J 


(27) 
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2 

with  Wq,  independent  Xnm  variables.  (Compare  (16).) 

As  in  section  3.1,  the  distribution  of  G  may  be  approximated  by  that 
2 

of  cxv  with,  in  this  case 

(28.1)  c-  [^2  (n-jrta2  (n-j)'1)'1 


J-0  3-0 

(28.2)  v  -  2atrf  (n-j)"1]2  if  (n-J)"2)"1. 

j-0  j-0 


Variation  in  accuracy  with  a  and  n  will  be  exactly  similar  to  that  in 
Section  3.1. 


.  ♦.  Some  Numerical  Comparisons 

Table  3  gives  some  values  of  -2  log  Cq  qj  for  each  of  the  three  tests 

(13),  (19)  and  (24).  Values  in  parentheses  were  calculated  from  approximations 
2 

by  (i)  using  cxv  approximation  and  (ii)  making  an  ad  hoa  correction  based 
on  comparison  between  exact  and  approximate  values  in  cases  when  the  former 
was  calculated.  The  (exact)  values  for  m  ■  1  (case  (b))  are  taken  from  [3). 

Table  3:  Critical  limits  for  (a)  one-sided  (b)  symmetrical  and 
(c)  general  purpose  tests  (Values  of  -2  log^  Cy 

_ 0,05 _  _ 5 — 


r 

m 

(a) 

(b) 

(c) 

4 

1 

1.281 

4.435 

0.572 

2 

3.821 

(10.66) 

1.839 

3 

6.734 

(17.40) 

3.318 

4 

(9.65) 

(24.40) 

(4.90) 

10 

1 

2.703 

7.115 

1.862 

2 

(6.85) 

(16.50) 

(4.72) 

3 

(11. A5) 

(25.55) 

(7.79) 

4 

(16.15) 

(36.80) 

(11.00) 

Table  4  gives  powers  of  those  tests,  with  a  -  0.05,  with  reepset  to 
alternative  hypotheses  n  -  r+2,  r+6,  r+10,  Values  in  parentheses  were 
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2 

obtained  by  using  cxv  approximation,  with  the  Cq  values  corresponding  to 

Table  3.  (In  Appendix  II  there  is  some  evidence  indicating  that  as  n 
2 

Increases,  the  cy^  approximation  rapidly  increases  in  accuracy.)  For  the 
one-sided  1  and  "symmetrical"  tests  the  "best"  forms  of  alternatives  are 
assumed,  i.e.,  s^  *0,  *r  “  n-r  for  "one-aided",  8q  “  8r  “  4(n-r)  for 
"symmetrical".  For  the  "general  purpose"  tests,  power  depends  only  on 
(»Q+ar) (-n-r) . 


Table  4:  Power  of  tests  (a),  (b)  and  (c)  (5Z  Significance  Level) 


Power  of  (a) 


r 

n 

m  - 

1 

2 

3 

4 

4 

6 

.294 

.557 

(.749) 

(.872) 

10 

.780 

.989 

(*> 

(*) 

14 

.955 

* 

(*) 

(*) 

10 

12 

.364 

(.636) 

(.829) 

(.923) 

16 

.907 

(*) 

(*) 

(*> 

20 

.988 

(*> 

(*) 

(*) 

Power  of 

0>> 

r 

n 

m  • 

1 

2 

3 

4 

4 

6 

.206 

(.364) 

(.522) 

(.658) 

10 

.594 

(.933) 

(.996) 

(*> 

14 

.841 

(*) 

(*) 

(*) 

10 

12 

(.269) 

(.442) 

(.664) 

(.795) 

16 

(.798) 

(.992) 

(*) 

(*) 

20 

(.982) 

(*) 

(*) 

(*) 

Power  of 

(c) 

r 

n 

e  • 

1 

2 

3 

4 

4 

6 

.167 

.296 

(.420) 

(.530) 

10 

.470 

.827 

(.958) 

(.991) 

14 

.716 

.978 

(.999) 

(*> 

10 

12 

.238 

(.419) 

(.547) 

(.677) 

16 

.732 

(.969) 

(.996) 

(*) 

20 

.949 

(*) 

(*) 

(*> 

(*  denotes  "over  .9995") 
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The  figures  in  Table  4  exhibit  the  rapid  increase  in  power  with  a,  the 
number  of  samples  In  the  sequence. 

Such  powers  will  n> -c  be  attainable  if  the  population  density  function 
f(t)  is  not  known.  However,  they  do  indicate  the  possibility  that  with  a 
sequence  of  moderate  length,  good  power  may  be  obtained  even  when  f(t)  is 
not  completely  known-  for  example  when  the  form  of  f(t)  is  known,  but  some 
parameters  have  to  be  estimated. 
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We  next  consider 


k 

l 

j-1 


„U) 


which  may  be  regarded  as  the  sum  of  two  independent  random  variables,  each 
distributed  as  Y^.  From  the  mixture  representation  (A. 2)  we  see  that  the 
distribution  of  Y^  is  also  a  formal  mixture,  as  set  out  in  the  following 
table: 


Distribution  Weight 

*4  "f’  !iV4>  bj 


<1) 


(1) 


WJ  +w 


2bj  V 


a  <  j') 


w  c i > 

Again  using  (A. 2),  the  distribution  of  (a^  a  formal 

mixture  of 


(A.  3) 


2  -1 

a^  x2  with  weight  (l-a^./a^ 

2  “1 

aj,x2  with  weight  (1-a^/a.j,) 


Hence,  for  y  >  0 
(A.  4) 


-*y/®4 


-%y/a. 


Pr[Y,<y]  •  l  b*(l  -  e  J  ~(%y/a.)e 
j-1  3  3 

,  -%y/a,  ,  ~%y/a  , 

*  *  4  a“ )  V  *  1 


k  -Vy/a,  „  _ 

-  1  -  [  e  3[b*(l+$y/a,)+2b  T  (l-a,,/a.)  S,.] 

j-1  3  3  3jVj  3  3  3 


Wa  now  briefly  consider 

'» '  j,  Vf’ 
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which  has  the  formal  mixture  distribution  set  out  below: 


VjU>livI> 

b3 

J 

n)*J' 

(j  S  j’) 

6  b^.y 

CJ  <  3'  <  3”) 

To  obtain  a  representation  of  the  distribution  of  (a  wj^+a.  ,W^P)  we 


j  j  y r 


note  that 


a  wj2>  +  a  -  a  wj15  +  (a  W(1)  +  a^.wjP  ). 

j  1  3  3  j  3  32  33 

The  distribution  of  (a  +  a  )  can  be  obtained  from  (A. 3).  We  find 

J  Jj  3  3 

that  (a.W^2^  +  a.  ,W^P)  is  distributed  as  affixture  of 

J  J  J  J 


(A.  5) 


aj>-4  with  weight  (l-a^,/a^> 


-1 


-1, 


8jX2  with  weight  (1-a^ ,/a^)  (l-a^/a^,) 

I  2 

[  a^,x2  with  weight  (1-a^/a^ ,) 


,-l 


-2 


After  some  manipulation  we  find  that  for  y  >  0 

k 


(A.  6) 


Pr[Y,<y]  *  1  ~  l  b3[l+($y/a  )+*(*y/a  )  ]e  3 
J  j-1  -  3  3 

"  3  I  I  b2b  , (1-a  ,/a.)tl+(%y/a  )+(l-a./a  ,)  1}e  ^  3 

I  J  J  J  J  J  J  J 

k  ,  ,  “%y/a. 

-  3  I  b,{  l  b  ,(l-a  ,/a  )  V  e  3  . 

3-1  3  jVj  3  3  3 


Similar  formulae  can  be  obtained  for  any 
k 


•  T  a  W<B> 

jii  *  * 
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The  length  oi  the  formula  increases  tjuitc  rapidly  with  m. 

In  the  particular  case  (16),  which  can  be  written,  in  our  present  notation 

(A. 7)  G  -  l  (n-J+1)'1  W(B° 

jBl  J 

we  obtain  from  (A.l),  putting  k.  «  r  and  a^  *»  (n-J+l)  2, 


j  n  t+i  3  t 


For  at  «■  l(and  y>0),  from  (A.  2) 


PrtY^y]  -  PrlY^y]  -  (")  j^(-l)J+1(*>  e’%(n*J+1)y 


For  m  •  2,  from  (A. 4) 


(A. 10) 


,  r  -%(n-i+l)y  ,  .  , 

PrtY2>y]  -  l  e  l(j>  {1+%(a’j+JL)y} 


+  2(-l)j(^)j(n-j+l)“1  I 

3  jVj  3 

Some  particular  cases  (used  in  calculating  Tables  3  and  4)  are  set  out  below. 
(Note  that  G  In  (27)  la  obtained  from  (16)  by  changing  r  to  (r-1).) 


r  n  ?x[Y2  >  y] 

4  4  (8y-  •i|^)e"y/2+36(y-l)e“y+8(3y+8)e"3y/2+(2y+y)e'’2y 

6  200(3y-2)e"3y/2+2025(2y-3)e”2y-H48(5y+12)e“4y/2+100<3y+23)e_3y 

10  2102*I(fy  -  iIfi)e-7y/2+(9y  -  ^)e-4y+(8y^)e^2+(^4f|)e-5y] 

i4  iooi2,((|jy. 

3  4  36(y-5)e“y+32(3y+2)e'3y/2+9(2y+13)e"2y 

6  225 (2y- 11) e"2y+288(5y+2) e"5y /2+100 ( 3y+19) e“3y 

10  1202*ufey-  i2I)e-4y+(2^)e-9y/2+(|^)e.5yj 


207  i00 


14  3642*[(|y-  ~|)e'6y+(-i|y  +  jf|)ft“13y/2+(f5y  * 
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For  m  »  3, 


(A. 11) 


^  n  T  k  -V(n-3+l)y  ,  .  . 

Pr{Y  >y]  -  <-l)r<V  l  e  [<-l)J(*)3(-^)3{l+*(n-j+l)y 

j»l  J  n-j-ri 

+  *t*(n-j+l)y]2)  +  3<^)2C— Jrr)2  I  (-l>J’(!f,)  * 

J  n-j+l  jyjI  J  J  J 

_  ,  .2 


{l+%(n-j+l)y)  -  3(.r  -Hrr 
3  n-j+1 


r>2  -J—  W-Dj,(f.) 


j"  cj-j  *)- 


+  ^ 1  j^T  >2) 


In  particular,  for  r  ■  4,  a  ■  4 


M*3>y]-(8y2-144y+£^)e“*7+<-108y2+432y~3456)e~y+(72y2+528y+2944)e“3y/2 
(-2y2-46y-  •^^)e~‘‘y 


And  for  r  *■  3,  n  -  4 


Pr[Y  >y]  -  64[<||y2-27y+135)a“y-(9y2+12y+224)e“3y/2+||(y2+19y+ — )e"2y] 
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Appendix  II 

k  (m) 

Using  the  notation  of  Appendix  I,  if  a,  ~a^». . -m\ma  then  Ym  “  a  I  Wj 

is  distributed  exactly  as  a  x->  •  For  general  values  of  {a.}  we  might  hope 

2 

to  obtain  a  useful  approximation  by  supposing  Y^  to  be  distributed  as  c 

with  c  and  v  chosen  to  make  first  and  second  moments  agree.  That  is 

k  ~  r  2 

cv  -  E(Y  ]  -2m  T  a.;  2c  v  -  var(Y  )  -4m  £  a 
®  j-l  J  “  j-1  3 


or,  equivalently 


k  ,  k 

c  2  .  p 


2,  r  2 


l  I  “  2(  ^  a  )  /  ^  a  . 

J-l  4  J-l  3  j-l  3  j-l  3 


Approximations  of  this  kind  have  been  used  quite  widely  with  satisfactory 
results  ([4  ][  5]  etc.). 

In  order  to  check  how  suitable  the  approximation  is  in  our  particular  case 
some  numerical  comparisons  are  presented  here. 

For  sums  of  the  form  Y  -  <n-J+l)“ Vm) .  with  n  an  integer  at  least 

m  J-l  3 

equal  to  k  the  least  accurate  approximation  would  be  expected  when  n  -  k. 

As  n  increases,  so  that  the  ratios  n:  (n-l):...t  (n-k+1)  approach  1,  the 
distribution  should  become  closer  to  a  c  xv  distribution.  Table  A.l  contains 
exact  and  approximate  values  of  PrlYB>y]  for  k-5  with  m-1,2  and  n-5,8,10 
to  exemplify  this  point. 

The  exact  formulae  are 


c  •  5: 


Pr[Yj>y]  -  1  -  (l-e'y/2)5 

Pr[Y2>y]  -  (rj y-  -^p-)e~y/2+(100y-  -~)e“y+(150y+lC0)e~3y/2 


+  (50y+~)e~2y  +  (|y+ 


l|l)e-5y/2 


5 


n  -  3:  ?r[Y1>y]  -  70c"2y  -224e"5y/2  +  280c~2y  -  160e~/y/2  +  35e“4y 

Pr[Y2>y]  -  562[(y  y  -  ^)e'2y+(40y  -  -~)e‘5y/2+(75y+25)e“3y 

.  +  (200  15200 -7y/2  2b  2575  -4y 

147  7  '■16  y  192;  J 

n  -  10:  Pr[Y^>y]  -  210e-3y  -  720 e~7y/2  +  945e~4y  -  560e"9y/2  +  126e"Sy 


-4y  -  560e*9y/2  -f  126e"5y 


Pr[Y2>yl  -  2522l(|f  y  -  -  ^)e-7y'2+<^y  - 

+  (220y+6|20)s-9y/2+(5y+^)e-5y] 


For  m  ■  3,  and  n  -  5. 

Pr(Vy]  .  <A§V  .  ,  ♦  2Sf§2  J^-CSOOy2  -  <000,  *  »!  ).-> 

+  (1125y2  +  1500y  +  34750) e"3y/,2-(250y2+2750y  +  )e_2y 

«.  (15  2  645  +  6887  -5y/2  . 

V  8y  6^  12  ,e 


2 

For  calculations  of  approximate  values  (based  on  c  ^  distributions) 
the  following  values  were  used: 


n  -  5 


c 

0.6410 

0.1880 

0.1332 


7. 124ta 
9.410m 
9.692m 
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Table  A.l:  Comparison  of  Exact  and  Approximate  Values  of 


5 

Pr[  l  (n-j+1) 
j=l 

3 

V 

« — J 

n 

-  5 

n  ■ 

8 

n 

-  10 

m  “ 

1 

y 

Exact 

Approx. 

Exact 

Approx. 

Exact 

Approx, 

0.5 

.9995 

.998 

.933 

.982 

.951 

.950 

1 

.991 

.982 

.836 

.834 

.649 

.650 

1.5 

.959 

.943 

.575 

.576 

.312 

.312 

2 

.899 

.882 

.333 

.336 

.118 

.118 

3 

.717 

.712 

.080 

.080 

.011 

.011 

4 

.517 

.526 

.015 

.014 

.0008 

.0007 

5 

.343 

.363 

.002 

.002 

- 

- 

6 

.225 

.238 

.0004 

.0003 

- 

- 

7 

.142 

.149 

- 

- 

- 

a  «* 

2 

1 

- 

.9992 

.9990 

.993 

.993 

2 

.9997 

.9990 

.933 

.932 

.743 

.743 

3 

.995 

.991 

.648 

.649 

.279 

.279 

4 

.973 

.964 

.309 

.311 

.058 

.057 

6 

.82  8 

.821 

.031 

.030 

.0009 

.0008 

8 

.580 

.587 

.0016 

.0014 

- 

- 

10 

.344 

.356 

- 

- 

- 

- 

12 

.182 

.188 

- 

- 

- 

- 

14 

.088 

.089 

- 

- 

- 

- 

a  ■ 

3 

• 

[  Exact 

4 

.9998 

6  8 
.992  .942  . 

10  12  14 

811  .620  .421 

16  18 
.259  .147 

20 

.079 

n  ■ 

5 

[  Approx. 

.9994 

.988  .934  . 

809  .626  .431 

.267  .151 

.078 

The  Improvement  in  accuracy  with  □  Is  marked,  but  vitb  m,  less  so. 

This  suggests  that  is  might  be  worthwhile  devoting  special  efforts  to  obtaining 
exact  values  for  significance  limits,  while  relying  on  approximations  for 
evaluation  of  power. 
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